375 research outputs found

    An intuitive Python interface for Bioconductor libraries demonstrates the utility of language translators

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Computer languages can be domain-related, and in the case of multidisciplinary projects, knowledge of several languages will be needed in order to quickly implements ideas. Moreover, each computer language has relative strong points, making some languages better suited than others for a given task to be implemented. The Bioconductor project, based on the R language, has become a reference for the numerical processing and statistical analysis of data coming from high-throughput biological assays, providing a rich selection of methods and algorithms to the research community. At the same time, Python has matured as a rich and reliable language for the agile development of prototypes or final implementations, as well as for handling large data sets.</p> <p>Results</p> <p>The data structures and functions from Bioconductor can be exposed to Python as a regular library. This allows a fully transparent and native use of Bioconductor from Python, without one having to know the R language and with only a small community of <it>translators</it> required to know both. To demonstrate this, we have implemented such Python representations for key infrastructure packages in Bioconductor, letting a Python programmer handle annotation data, microarray data, and next-generation sequencing data.</p> <p>Conclusions</p> <p>Bioconductor is now not solely reserved to R users. Building a Python application using Bioconductor functionality can be done just like if Bioconductor was a Python package. Moreover, similar principles can be applied to other languages and libraries. Our Python package is available at: <url>http://pypi.python.org/pypi/rpy2-bioconductor-extensions/</url></p

    Ten Simple Rules for Getting Help from Online Scientific Communities

    Get PDF
    The increasing complexity of research requires scientists to work at the intersection of multiple fields and to face problems for which their formal education has not prepared them. For example, biologists with no or little background in programming are now often using complex scripts to handle the results from their experiments; vice versa, programmers wishing to enter the world of bioinformatics must know about biochemistry, genetics, and other fields. In this context, communication tools such as mailing lists, web forums, and online communities acquire increasing importance. These tools permit scientists to quickly contact people skilled in a specialized field. A question posed properly to the right online scientific community can help in solving difficult problems, often faster than screening literature or writing to publication authors. The growth of active online scientific communities, such as those listed in Table S1, demonstrates how these tools are becoming an important source of support for an increasing number of researchers. Nevertheless, making proper use of these resources is not easy. Adhering to the social norms of World Wide Web communication—loosely termed “netiquette”—is both important and non-trivial. In this article, we take inspiration from our experience on Internet-shared scientific knowledge, and from similar documents such as “Asking the Questions the Smart Way” and “Getting Answers”, to provide guidelines and suggestions on how to use online communities to solve scientific problems

    A single fungal strain was the unexpected cause of a mass aspergillosis outbreak in the world's largest and only flightless parrot.

    Get PDF
    Kākāpō are a critically endangered species of parrots restricted to a few islands off the coast of New Zealand. Kākāpō are very closely monitored, especially during nesting seasons. In 2019, during a highly successful nesting season, an outbreak of aspergillosis affected 21 individuals and led to the deaths of 9, leaving a population of only 211 kākāpō. In monitoring this outbreak, cultures of aspergillus were grown, and genome sequenced. These sequences demonstrate that, very unusually for an aspergillus outbreak, a single strain of aspergillus caused the outbreak. This strain was found on two islands, but only one had an outbreak of aspergillosis; indicating that the strain was necessary, but not sufficient, to cause disease. Our analysis provides an understanding of the 2019 outbreak and provides potential ways to manage such events in the future

    TaxMan: a taxonomic database manager

    Get PDF
    BACKGROUND: Phylogenetic analysis of large, multiple-gene datasets, assembled from public sequence databases, is rapidly becoming a popular way to approach difficult phylogenetic problems. Supermatrices (concatenated multiple sequence alignments of multiple genes) can yield more phylogenetic signal than individual genes. However, manually assembling such datasets for a large taxonomic group is time-consuming and error-prone. Additionally, sequence curation, alignment and assessment of the results of phylogenetic analysis are made particularly difficult by the potential for a given gene in a given species to be unrepresented, or to be represented by multiple or partial sequences. We have developed a software package, TaxMan, that largely automates the processes of sequence acquisition, consensus building, alignment and taxon selection to facilitate this type of phylogenetic study. RESULTS: TaxMan uses freely available tools to allow rapid assembly, storage and analysis of large, aligned DNA and protein sequence datasets for user-defined sets of species and genes. The user provides GenBank format files and a list of gene names and synonyms for the loci to analyse. Sequences are extracted from the GenBank files on the basis of annotation and sequence similarity. Consensus sequences are built automatically. Alignment is carried out (where possible, at the protein level) and aligned sequences are stored in a database. TaxMan can automatically determine the best subset of taxa to examine phylogeny at a given taxonomic level. By using the stored aligned sequences, large concatenated multiple sequence alignments can be generated rapidly for a subset and output in analysis-ready file formats. Trees resulting from phylogenetic analysis can be stored and compared with a reference taxonomy. CONCLUSION: TaxMan allows rapid automated assembly of a multigene datasets of aligned sequences for large taxonomic groups. By extracting sequences on the basis of both annotation and BLAST similarity, it ensures that all available sequence data can be brought to bear on a phylogenetic problem, but remains fast enough to cope with many thousands of records. By automatically assisting in the selection of the best subset of taxa to address a particular phylogenetic problem, TaxMan greatly speeds up the process of generating multiple sequence alignments for phylogenetic analysis. Our results indicate that an automated phylogenetic workbench can be a useful tool when correctly guided by user knowledge

    PseudoGeneQuest – Service for identification of different pseudogene types in the human genome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Pseudogenes, nonfunctional copies of genes, evolve fast due the lack of evolutionary pressures and thus appear in several different forms. PseudoGeneQuest is an online tool to search the human genome for a given query sequence and to identify different types of pseudogenes as well as novel genes and gene fragments.</p> <p>Description</p> <p>The service can detect pseudogenes, that have arisen either by retrotransposition or segmental genome duplication, many of which are not listed in the public pseudogene databases. The service has a user-friendly web interface and uses a powerful computer cluster in order to perform parallel searches and provide relatively fast runtimes despite exhaustive database searches and analyses.</p> <p>Conclusion</p> <p>PseudoGeneQuest is a versatile tool for detecting novel pseudogene candidates from the human genome. The service searches human genome sequences for five types of pseudogenes and provides an output that allows easy further analysis of observations. In addition to the result file the system provides visualization of the results linked to Ensembl Genome Browser. PseudoGeneQuest service is freely available.</p

    Distribution of Introns in Fungal Histone Genes

    Get PDF
    Saccharomycotina and Taphrinomycotina lack intron in their histone genes, except for an intron in one of histone H4 genes of Yarrowia lipolytica. On the other hand, Basidiomycota and Perizomycotina have introns in their histone genes. We compared the distributions of 81, 47, 79, and 98 introns in the fungal histone H2A, H2B, H3, and H4 genes, respectively. Based on the multiple alignments of the amino acid sequences of histones, we identified 19, 13, 31, and 22 intron insertion sites in the histone H2A, H2B, H3, and H4 genes, respectively. Surprisingly only one hot spot of introns in the histone H2A gene is shared between Basidiomycota and Perizomycotina, suggesting that most of introns of Basidiomycota and Perizomycotina were acquired independently. Our findings suggest that the common ancestor of Ascomycota and Basidiomycota maybe had a few introns in the histone genes. In the course of fungal evolution, Saccharomycotina and Taphrinomycotina lost the histone introns; Basidiomycota and Perizomycotina acquired other introns independently. In addition, most of the introns have sequence similarity among introns of phylogenetically close species, strongly suggesting that horizontal intron transfer events between phylogenetically distant species have not occurred recently in the fungal histone genes

    Expedited batch processing and analysis of transposon insertions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>With advances in sequencing technology, greater and greater amounts of eukaryotic genome data are becoming available. Often, large portions of these genomes consist of transposable elements, frequently accounting for 50% or more in vertebrates. Each transposable element family may have thousands or tens of thousands of individual copies within a given genome, and therefore it can take an exorbitant amount of time and effort to process data in a meaningful fashion.</p> <p>Findings</p> <p>In order to combat this problem, we developed a set of bioinformatics techniques and programs to streamline the analysis. This includes a unique Perl script which automates the process of taking BLAST, Repeatmasker and similar data to extract and manipulate the hit sequences from the genome. This script, called Process_hits uses an object-oriented methodology to compile all hit locations from a given file for processing, organize this data into useable categories, and output it in multiple formats.</p> <p>Conclusions</p> <p>The program proved capable of handling large amounts of transposon data in an efficient fashion. It is equipped with a number of useful sub-functions, each of which is contained within its own sub-module to allow for greater expandability and as a foundation for future program design.</p

    A single fungal strain was the unexpected cause of a mass aspergillosis outbreak in the world's largest and only flightless parrot

    Get PDF
    Kākāpō are a critically endangered species of parrots restricted to a few islands off the coast of New Zealand. Kākāpō are very closely monitored, especially during nesting seasons. In 2019, during a highly successful nesting season, an outbreak of aspergillosis affected 21 individuals and led to the deaths of 9, leaving a population of only 211 kākāpō. In monitoring this outbreak, cultures of aspergillus were grown, and genome sequenced. These sequences demonstrate that, very unusually for an aspergillus outbreak, a single strain of aspergillus caused the outbreak. This strain was found on two islands, but only one had an outbreak of aspergillosis; indicating that the strain was necessary, but not sufficient, to cause disease. Our analysis provides an understanding of the 2019 outbreak and provides potential ways to manage such events in the future
    corecore